Leave-one-out Word Alignment without Garbage Collector Effects
نویسندگان
چکیده
Expectation-maximization algorithms, such as those implemented in GIZA++ pervade the field of unsupervised word alignment. However, these algorithms have a problem of over-fitting, leading to “garbage collector effects,” where rare words tend to be erroneously aligned to untranslated words. This paper proposes a leave-one-out expectationmaximization algorithm for unsupervised word alignment to address this problem. The proposed method excludes information derived from the alignment of a sentence pair from the alignment models used to align it. This prevents erroneous alignments within a sentence pair from supporting themselves. Experimental results on Chinese-English and Japanese-English corpora show that the F1, precision and recall of alignment were consistently increased by 5.0% – 17.2%, and BLEU scores of end-to-end translation were raised by 0.03 – 1.30. The proposed method also outperformed l0-normalized GIZA++ and Kneser-Ney smoothed GIZA++.
منابع مشابه
Conservative Garbage Collection on Distributed Shared Memory Systems
In this paper we present the design and implementationof a conservative garbage collection algorithm for distributed shared memory (DSM) applications that use weakly–typed languages like C or C++, and evaluate its performance. In the absence of language support to identify references, our algorithm constructed a conservative approximation of the set of cross–node references based on local infor...
متن کاملDesigning a Concurrent Hardware Garbage Collector for Small Embedded Systems
Today more and more functionality is packed into all kinds of embedded systems, making high-level languages, such as Java, increasingly attractive as implementation languages. However, certain aspects, essential to high-level languages are much harder to address in a low performance, small embedded system than on a desktop computer. One of these aspects is memory management with garbage collect...
متن کاملGenerational Garbage Collection for Lazy Functional Languages without Temporary Space Leaks
Generational garbage collection is an established method for creating eecient garbage collectors. Even a simple implementation where all nodes that survive one garbage collection are tenured, i.e., moved to an old generation, works well in strict languages. In a lazy language, however, such an implementation can create severe temporary space leaks. The temporary space leaks appear in programs t...
متن کاملGenerational Garbage Collection without Temporary Space Leaks for Lazy Functional Languages
Generational garbage collection is an established method for creating eecient garbage collectors. Even a simple implementation where all nodes that survive one garbage collection are tenured, i.e., moved to an old generation , works well in strict languages. In lazy languages, however, such an implementation can create severe temporary space leaks. The temporary space leaks appear in programs t...
متن کاملGarbage Collecting Reactive Real-Time Systems
As real-time systems become more complex, the need for more sophisticated runtime kernel features arises. One such feature that substantially lessens the burden of the programmer is automatic memory management, or garbage collection. However, incorporating garbage collection in a real-time kernel is not an easy task. One needs to guarantee, not only that sufficient memory will be reclaimed in o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015